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It is well known that the number of modes of a kernel density 
estimator is monotone nonincreasing in the bandwidth if the kernel 
is a Gaussian density. There is numerical evidence of nonmonotonic¬ 
ity in the case of some non-Gaussian kernels, but little additional 
information is available. The present paper provides theoretical and 
numerical descriptions of the extent to which the number of modes 
is a nonmonotone function of bandwidth in the case of general com¬ 
pactly supported densities. Our results address popular kernels used 
in practice, for example, the Epanechnikov, biweight and triweight 
kernels, and show that in such cases nonmonotonicity is present with 
strictly positive probability for all sample sizes n > 3. In the Epanech¬ 
nikov and biweight cases the probability of nonmonotonicity equals 1 
for all n > 2. Nevertheless, in spite of the prevalence of lack of mono¬ 
tonicity revealed by these results, it is shown that the notion of a 
critical bandwidth (the smallest bandwidth above which the number 
of modes is guaranteed to be monotone) is still well defined. More¬ 
over, just as in the Gaussian case, the critical bandwidth is of the 
same size as the bandwidth that minimises mean squared error of 
the density estimator. These theoretical results, and new numerical 
evidence, show that the main effects of nonmonotonicity occur for 
relatively small band widths, and have negligible impact on many as¬ 
pects of bump hunting. 

1. Introduction. Compactly supported kernels, particularly the biweight, 
are predominantly used in practice when constructing a nonparametric den¬ 
sity estimator. There are at least two reasons: ease of computation (calcu¬ 
lation is simplified if a curve estimate at a given point uses only a relatively 
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small fraction of the data); and, more philosophically, a desire to ensure that 
a density estimator uses only local information. However, many “shape” 
properties of kernel density estimators are well understood only in the case 
of infinitely supported, Gaussian kernels. 

Responding to this issue, in the present paper we quantify a range of 
properties of non-Gaussian kernels when used to identify bumps in non- 
parametric density estimation. Numerical results [e.g., Minnotte and Scott 
(1993)] have shown that in practical circumstances some commonly used, 
compactly supported kernels may not give rise to nonparametric density 
estimators whose modality is a monotone function of bandwidth. However, 
theoretical explanations of this property are not available, and neither is it 
clear whether the nonmonotonicity property will upset the sorts of bump¬ 
hunting applications to which kernel density estimators are often put. For 
example, can the biweight kernel be profitably used to implement Silver¬ 
man’s (1981) test for unimodality, or does the nonmonotonicity property 
interfere at too high a level for this to be feasible? 

This paper provides answers to these questions. In Section 2 we give 
general theoretical results that address nonmonotonicity problems arising 
with compactly supported kernels. The results are illustrated theoretically 
in terms of commonly employed “multiweight” kernels, such as the uniweight 
(or Epanechnikov) density, and the biweight and triweight densities. Never¬ 
theless our results are very general, and apply to a wide range of compactly 
supported kernels. Numerical illustrations of theoretical properties are given 
in Section 4. 

To give an example, it follows from our results that when density esti¬ 
mators are calculated using the biweight kernel, the number of modes of a 
kernel density estimator is, with probability 1, a nonmonotone function of 
the bandwidth, h , whenever sample size, n, equals two or more. Interest¬ 
ingly, this result fails for the triweight kernel. In that case, for n = 2 and 
with probability 1, the number of modes is monotone in h. However the 
probability that it is nonmonotone is strictly positive whenever n > 3. 

Results of this type add considerably to the information provided by more 
conventional analytical results, such as those of Schoenberg (1950). From 
those it may be deduced only that for compactly supported kernels, and 
sufficiently large sample sizes, there exist deterministic data constructions 
for which the nonmonotonicity property fails. By way of contrast, our re¬ 
sults show that nonmonotonicity fails for the sorts of datasets that arise in 
practice, and for a wide range of sample sizes (generally, for n > 3). 

These properties lead pointedly to the question of whether the critical 
bandwidth, in the case of compactly supported kernels, is of the same size 
as it would be for a Gaussian kernel. The critical bandwidth is defined as 
the “smallest” bandwidth, in some sense, such that a nonparametric density 
estimator is unimodal. When the Gaussian kernel is used, monotonicity of 
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the number of modes as a function of bandwidth means that there is no 
ambiguity in the definition of “smallest.” 

The situation is much less clear for compactly supported kernels, however. 
Nevertheless, we shall show that provided the kernel is unimodal and concave 
at the mode, one may unambiguously define the “smallest” bandwidth h CI it 
to be the infimum of values hi > 0 such that the number of modes of a 
density estimator equals 1 for all h>h±. This version of h cr a is well defined, 
and strictly positive, with probability 1. Moreover, the mode tree technology 
developed by Minnotte and Scott (1993) [see also Minnotte (1997)] enables 
this definition to be used in practice without difficulty. 

One of the particularly attractive features of the critical bandwidth for a 
Gaussian kernel is that it is of size n -1 / 5 , this being the order that produces 
optimal mean squared error performance for density estimators in a standard 
second-order setting. In Section 3 we show that the same is true for a wide 
range of compactly supported kernels, including the biweight, provided our 
alternative definition of the critical bandwidth is used. In this sense the 
effect of nonmonotonicity of number of modes occurs at a relatively low 
level, and is not so great as to hinder the main features of a kernel density 
estimator. The mathematical argument behind this result is nonstandard, 
since a conventional approach relies on monotonicity, but nevertheless the 
result can be viewed as an extension of its counterpart for a Gaussian kernel. 

All our methods and results have application to problems involving non- 
parametric regression, where only minor modifications are necessary. We 
have chosen to state them in the context of density estimation since pass¬ 
ing in the reverse direction, from regression to density estimation, is not so 
straightforward; see the discussion by Chaudhuri and Marron [(2000), page 213]. 

There is no problem extending our results to the case of modes in es¬ 
timators of density derivatives. As far as bimodality, or multimodality, is 
concerned, the main issue of interest is whether the bandwidth above which 
monotonicity of the number of modes (as a function of bandwidth) occurs 
is one for which the density estimator is multimodal with an appropriate 
number of modes. Indeed, if k > 2 is given then it is possible, when using a 
compactly supported kernel, that the density estimator will not have at least 
k modes for a bandwidth, h, in a range [ho, oo) where the number of modes 
is monotone in h. (This is relatively likely to occur if the actual density has 
strictly fewer than k modes.) This possibility does not arise when k = 1, and 
for general k it does not occur when using a Gaussian kernel. As a result, 
it is relatively unattractive to use compactly supported kernels in problems 
where strict multimodality is being investigated. 

There is a diverse and extensive literature on bump hunting in nonpara- 
metric density estimation, much of it starting from contributions of Good 
and Gaskins (1980) and Silverman (1981). Formal and informal approaches 
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to assessing modality include those of Hartigan and Hartigan (1985), Izen- 
man and Sommer (1988), Roeder (1990, 1994), Cuevas and Gonzales-Manteiga 
(1991), Muller and Sawitzki (1991), Minnotte and Scott (1993), Fisher, 
Marnmen and Marron (1994), Escobar and West (1995), Polonik (1995a, b), 
Minnotte (1997), Chaudhuri and Marron (1999, 2000), Cheng and Hall 
(1999) and Fisher and Marron (2001). A small number of techniques, for 
example, the recent scale-space methods introduced by Chaudhuri and Mar¬ 
ron (1999, 2000), rely on monotonicity of number of modes (as a function of 
bandwidth) in order to convey information. However, others, in particular 
formal or informal hypothesis testing approaches, require little more than 
the notion of a critical bandwidth and therefore suffer hardly at all from 
nonmonotonicity; as we show, lack of monotonicity occurs only for rela¬ 
tively small bandwidths. In these cases, and others, our results indicate that 
nonmonotonicity for popular kernels such as the biweight is generally not 
a significant problem. This serves to encourage their use in bump hunting 
problems. 

2. Theory describing nonmonotonicity for non-Gaussian kernels. 

2.1. Preliminaries. We say that a continuous density / (or density es¬ 
timator /), continuously differentiable on its support, has just k modes if 
it has only a finite number of points of inflection on its support, and just 
k local maxima x±,.. . ,Xk- The values of Xj are called the modes of /. We 
say that / is strictly unimodal if / has just one mode in the sense defined 
above. 

The assumption of continuity is made solely to simplify the definition of a 
density with k modes; it may be weakened. Likewise we may remove the con¬ 
dition that the density has only isolated points of inflection on its support, 
although it should be appreciated that this alters the type of information 
contained in our results. We are not aware of any kernel used in practice 
that violates this condition. 

Given a kernel K, bandwidth h and sample X = {X \,..., X n }, let f = fh, 
defined by 

1=1 

denote a conventional kernel density estimator. It is clear that if K is strictly 
unimodal, continuous on the real line, and supported on a compact interval, 
and if the data A* are distinct, then for all sufficiently small h, fh has just 
n modes. We shall say that “the number of modes of the kernel estimator 
fh is not monotone in h n if there exist 0 < h\ < h -2 such that the number of 
modes of //,, is strictly less than the number of modes of fh 2 ■ 
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2.2. Monotonicity of number of modes for large bandwidths. First we 
introduce a unimodality condition: 

, . K is compactly supported and strictly unimodal, and is con- 

' ' cave in a neighborhood of its mode. 

Theorem 2.1 shows that (2.2) ensures fh is unimodal for all sufficiently 
large h. 

Theorem 2.1. If (2.2) holds, and if the data X come from a continuous 
distribution, then with probability 1 there exists a bandwidth h = h(X) such 
that fh is strictly unimodal for all h> h. 

Proofs of Theorems 2.1 and 2.2 are given in Sections 5.1 and 5.2, respec¬ 
tively. A derivation of Theorem 2.3 is similar. Theorem 2.1 implies that the 
“critical bandwidth,” given by 

(2.3) h cr ;t = inf{/ii > 0: fh is unimodal for all h > hi}, 

is well defined with probability 1. Moreover, assuming the sampled distri¬ 
bution is continuous, P(h c ut > 0) = 1. Throughout the paper, h cr ^ is given 
by (2.3). Minnotte and Scott’s (1993) mode tree algorithm permits calcu¬ 
lation of /i C rit ■ Without the algorithm, checking large bandwidths to see if 
the corresponding density estimator was unimodal could be computationally 
difficult. 

2.3. Theorems applicable to multiweight kernels. Consider the condition 

K is a symmetric and strictly unimodal probability density 
with support equal to 1 = [—1,1], continuous on the real 

(2.4) line and continuously differentiable on Z, has two continuous 
derivatives in [1 — e, 1] for some e > 0, and satisfies K"(x ) < 0 
for some x € (|, 1). 

Theorem 2.2. If (2.4) holds, if n > 2, and if X = {X ±,..., X n } denotes 
a random sample drawn from a continuous distribution, then with probabil¬ 
ity 1 the number of modes of the kernel estimator fh is not monotone in h. 

Any kernel of the form K(x) = Cg( 1 — x 2 ) 6 on Z, where 0 <9 < 5/2 and 
Cq ensures f K = 1, satisfies (2.4). This class includes the uniweight (or 
Epanechnikov) and biweight kernels. 

Theorem 2.2 does not address the triweight case (0 = 3). In fact, when 
n = 2, and when K is the triweight kernel and the sampled distribution 
is continuous, with probability 1 the number of modes of fh is monotone 
nonincreasing as a function of the bandwidth. This result is available for 
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more general kernels, too; it is sufficient that (2.2) hold and that K be 
a symmetric probability density with support equal to 2, continuous on 
the real line, twice continuously differentiable on 2, with a unique point of 
inflection (£, say) on ( 0 ,1), and such that the only solutions 0 < x\ < X 2 < 1 
of the equations K'{x\) = K'{x 2 ) and K"(x i) = —K"{x 2 ) are x\ = X 2 = £. 
We ask too that K have 2k > 4 derivatives in a neighborhood of £, with 
A'- 2j )(£) = 0 for 1 <j < k — 1 and K^ 2k \ff) < 0. These conditions hold with 
A; = 3 when A is the triweight kernel. 

Our next result will show, however, that nonmonotonicity can occur with 
the triweight kernel provided n > 3. To this end, put k^(x) = A(£ + x) + 
K(£ — x) + K(x), and assume that: 

K is a symmetric and strictly uninrodal probability density 
with support equal to 2 = [— 1 , 1 ], continuous on the real 
(2.5) line, four times continuously differentiable on 2, and with the 
property that k^( 0 ) > 0 , 7 ^( 77 ) = 0 and K^(rj) > 0 for some 
£,*7 6(0,1). 

Theorem 2.3. If (2.5) holds, if n > 3, and if X = {Xi,... ,X n } denotes 
a random sample drawn from a continuous distribution, then with strictly 
positive probability the number of modes of fh is not monotone in h. 

Any kernel of the form K(x) = Cq( 1 — x 2 ) e on 2 , where 5/2 <9 < 11/2, 
satisfies (2.5). This class includes the triweight kernel, for which 6 = 3 and 
appropriate values of £ and 77 are £ = 0.9 and 77 = 0.45. 

3. Critical bandwidths and bootstrap tests. 

3.1. Methodology. The “classic” form of Silverman’s (1981) bandwidth 
test for unimodality is based on computing a critical bandwidth that, in 
some sense, is as small as possible subject to the density estimator fh at 
(2.1) being unimodal. If K is a Gaussian density then there can be no ambi¬ 
guity in defining the critical bandwidth: the number of modes is a monotone 
nonincreasing function of bandwidth, and so for any given dataset there is a 
bandwidth below which all density estimators have at least two modes, and 
above which all density estimators are unimodal [Schoenberg (1950)]. 

While this is not generally true for non-Gaussian kernels, that does not 
inhibit the definition of critical bandwidth given at (2.3). From a practical 
viewpoint it is quite feasible to define h cr i t , as we do at (2.3), by decreasing 
through bandwidths for which fh is unimodal, although it is generally not 
possible to define a critical bandwidth by increasing through bandwidths for 
which fh is multimodal. 
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Silverman’s (1981) bandwidth test for unimodality consists of rejecting 
the null hypothesis of unimodality if h cr it is “too large,” where the latter is 
determined using the bootstrap. Specifically, put / cr = fh ciit , let X ]*,..., X* 
be a resample drawn by sampling randomly (conditional on X ) and with 
replacement from the distribution with density / C rit) and define 

n 

(3.1) r h {x) = {nh)- 1 Y J K 

i= 1 

Let h* rit denote the version of h CI it in this setting, with ff replacing // 1 in the 
dehnition of h cr it- Given a nominal level a for the test, the null hypothesis 
of unimodality is rejected if 

(3.2) P{h* crit /h crit <l\X)>l-a. 

The technique, using our definition of h cr i t , can also be applied to assess 
unimodality in a subinterval of the support of a density. 

3.2. Large-sample properties of critical bandwidth. Assume that: 

/ has two continuous derivatives when considered as a function 
restricted to its support, which we take to equal S = [a, b] 

(3.3) where — oo < a < b < oo; that /(a) = f(b) = 0, f'(a+) > 0 and 
f'(b—) < 0; and that in the interior of S the equation f(x) = 0 
has a unique solution xo £ (a, b), and that f"(x o) < 0. 

Assume too that: 

K is a symmetric and strictly unimodal probability density 
, with support equal to 2" = [—1,1], is continuously differen- 
' ’ ^ tiable on the real line, and has three bounded derivatives when 
viewed as a function defined only on 1. 

This condition is satisfied by the biweight kernel, for example. 

The part of condition (3.3) which asserts that / decreases steeply to zero 
at either end of its support serves only to remove the effects of spurious 
“wiggles” in the tails of fh ■ Without such a constraint the size of the criti¬ 
cal bandwidth can be determined by random clusters of data in the tails of 
the distribution. In practice such effects are usually excluded by restricting 
attention to the body of the distribution when formally testing for uni¬ 
modality. However, there is a wide variety of ways of doing this, and for our 
purposes it is more appropriate to impose a condition which simply excludes 
tail effects. See Mammen, Marron and Fisher (1992) and Silverman (1983), 
for further discussion of this issue; they impose a condition close to (3.3). 
Given a standard Brownian motion W, define the stochastic process 

w(t,u) = /(x 0 ) 1/2 M _i y dW{v) - ^|/"(x 0 )|£ 2 


x-X* 
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for —oo < t < oo and u > 0. Put u'(t,u) = (d/dt)uj(t,u). Our first result 
argues that for all sufficiently large u, the stochastic process uj(-,u) is uni- 
rnodal. 

Theorem 3.1. Assume (3.4) holds. Then the probability that, for all u > 
v, uj(-,u) has a unique local maximum and no local minimum [equivalently, 
u'(-,u) has a unique downcrossing of 0 and no upcrossing of 0] on the real 
line converges to 1 as v —> oo. 

It is readily shown that the probability that for some u>v, uj(-,u) has 
both a local maximum and a local minimum, converges to 1 as v j 0. This 
result and Theorem 3.1 imply that with probability 1 the infimum, U cr n say, 
of the set of values v > 0 such that, for all u>v, co(-,u) has a unique local 
maximum and no local minimum, is well defined and strictly positive. 

Our next result shows that /i cr it is asymptotically of conventional size uA 1 / 5 , 
and that the “constant” of proportionality equals U cr ; t . 

Theorem 3.2. Assume (3.3) and (3.4) hold. Then with probability 1. 
h cr i t is well defined for all sufficiently large n, and n}^h cx it converges in 
distribution to C/ cr it - 

3.3. Properties of bootstrap test. First we describe large-sample proper¬ 
ties of the distribution of h* rit , conditional on the data. Theorem 3.1 implies 
the existence of a unique point, t = T say, at which 

fixo^U-* JdW{y) - |/"(x 0 )|t 

changes sign. Let W* denote a standard Wiener process independent of W, 
and put 

n*(t,u) = /(x 0 ) 1/2 u- 2 | A ^(^) dw *(v) 

+ /(x 0 ) 1/2 t/- 2 J K ’( T+ u^ V ) dW ^ v) - I f"{xo)\(T + tv). 

The argument used to prove Theorem 3.1 may be employed to show that 
the infimum t/* rit of the set of v > 0 such that, for all u>v, Q*(-,u) has 
a unique downcrossing of 0 and no upcrossing of 0, is well defined and 
strictly positive. The strong approximation argument leading to the proof 
of Theorem 3.2 may be used to prove that, assuming both (3.3) and (3.4), 
and employing suitable constructions of W and W*, 

sup |P(h* rit /h crit < x\X) - P(U* lit /U cr it < x\W)\ - 0 

—oo<x<oc 





BUMP HUNTING 


9 


in probability. 

It follows that the asymptotic level of the test defined at (3.2) is 

(3.5) 7T (a) = P{P(U* rit /U cr it < 1| W) > 1 - a}. 

Note that 0 < n(a) < 1 for each a. On the other hand, if the sampled den¬ 
sity is not unimodal then P(/i* rit //i cr ; t < x) — > 1 for each x > 0 , and so the 
probability at (3.2) converges to 1. That is, when the null hypothesis is false, 
the probability that the test leads to rejection converges to 1 as n —» oo. It is 
not true that 7 t(q) = a, and this equality also fails in the case of a Gaussian 
kernel; see Hall and York (2001) for discussion of the size of the error. 

4. Numerical properties for non-Gaussian kernels. 

4.1. Distribution of characteristics of nonmonotonicity. The theoretical 
results in Section 2 may be illustrated using the mode tree of Minnotte and 
Scott (1993), for small samples and various kernels. In particular, the case 
n = 3 is treated in Figure 1. There we took the sample to be {Xi, X 2 , X 3 } = 
{—1,0,1}. We used a = 1/3 for the Gaussian kernel. This gave effective 
support similar to that for the compact kernels. 

The four panels in Figure 1 correspond, respectively, to the kernels: (a) Epan- 
echnikov, (b) biweight, (c) triweight and (d) Gaussian. Panels (a)-(c) show 
that false modes appear at the points ±1/2 for each of the non-Gaussian 
kernels. This leads to nonmonotone behavior in each instance, and in fact as 
h increases, 3 modes —* 5 —* 3 —> 1 for the Epanechnikov, 3—>5—>2—>3—>1 
for the biweight and 3 —* 4 —> 2 —> 1 for the triweight. Clearly, the possibility 
of nonmonotonicity is very real for these kernels. 

To further investigate the mode behavior of multiweight kernels for this 
three-point dataset, the number of modes was found for 500 values of h and 
480 choices of 9 in the kernel Kg(x) = C$( 1 — x 2 ) e , ranging from 0.025 to 12 . 
The result in “mode space” may be seen in Figure 2. The number of modes, 
between 1 and 6 , is represented by the increasing density of six greyscale 
levels, as follows: 1 mode is indicated by the light grey in the north-west 
corner of the figure; 2 modes by the slightly darker adjacent region to its 
right, not touching any of the figure boundaries; 3 modes by the medium 
grey region that covers most of the south-east half of the figure, and also by 
the small area against the left-hand figure boundary immediately below the 
1-mode region; 4 modes by the small sliver of a region between the 2-mode 
and 3-mode areas; 5 modes by the very dark patch which meets the left-hand 
figure boundary at values of h between about 0.5 and 1.0; and a very small 
region of black, hardly detectable on the figure, representing 6 modes near 
(0,/i) = (2.5,1.02). 

The possibility of finding 6 modes in a density estimate from 3 points is 
demonstrated in Figure 3. Panel (a) shows a portion of the mode tree for 



(C) (d) 

Fig. 1. Mode trees for the n = 3 sample {—1,0,1}. Kernels used to produce the re¬ 
sults in panels (a)-(d) are, respectively, (a) Epanechnikov, (b) biweight, (c) triweight and 
(d) Gaussian. 


the case 8 = 2.5, while panels (b) and (c) show the density estimate, in full 
and in modal close-up, respectively, for the estimate with h = 1.02 and the 
same kernel. The estimate is nearly flat, but six modes appear, ranging from 
small to extremely small. 

Although clearly Figure 2 does not generalize directly to other datasets, it 
demonstrates both the complexity of the data -8-h interactions with respect 
to modes, and the ubiquity of modal nonmonotonicity. Even though it is 
often assumed that Kg provides a good approximation to the normal kernel 
for moderate values of 8, the monotonicity property does not appear for this 
simple dataset until 8 is close to 11. 

Next we investigated the relationship between h cr it, defined in Section 2, 
and the bandwidth h nonm , defined to be the smallest bandwidth at which 
nonmonotonicity appears as h is decreased from h CT [ t . We drew 1000 samples 
of size n from the distribution whose density was the Epanechnikov kernel, 
this choice being made because there the density estimates suffer in only 
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theta 

Fig. 2. Image of “mode space.'" The figure shows the numbers of modes of density esti¬ 
mates computed from the sample { — 1,0,1}, using the kernel Kg(x) = Cg(l — x 2 ) e . Values 
of 9 and h are indicated on the horizontal and vertical axes, respectively. Mode counts 
range from 1 (light grey, upper left part of the figure), through 3 (medium grey, lower right 
part), to 6 (black). 




Fig. 3. Six modes for estimates from the sample {—1,0,1} using 
Ks/ 2 (x) = C 5/2 (1 — a: 2 ) 5 ' 2 . Panel (a) depicts part of the mode tree, while panel 
(b) shows the estimate using h= 1.02. Panel (c) displays a close-up of the modes from 
the same estimate. 


minor ways from spurious modes in the tails. For each sample we computed 
density estimates using (a) Epanechnikov, (b) biweight and (c) triweight 
kernels, and formed the ratio R = h cr - lt /h nonm . Estimates of the probability 
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densities of log(-R) are plotted in Figure 4, for n = 10 (dotted line), 100 
(dashed line) and 1000 (solid line). For each estimate, the Gaussian kernel 
and the Sheather and Jones (1991) direct plug-in bandwidth were used. Note 
that both scales for the three panels vary considerably. 

Panel (a) of Figure 4, for the case of the Epanechnikov kernel, shows that 
for all three sample sizes, nonmonotonicity tends to occur at a bandwidth 
that is close to h CT i t . The biweight-kernel results presented in panel (b) show 
that nonmonotonicities are still very common, but that they now often ap¬ 
pear at a bandwidth which is significantly smaller than h CT \t- By way of 
contrast, panel (c) reveals that the triweight kernel is much less susceptible 
to nonmonotonicity, and that in this case sample size plays a larger role. 
The large peaks on the right in all three triweight estimates appear to be 
artifacts due to the discretized nature of the original estimates (400 points 
on [—1,1]). It appears possible that a triweight kernel-based estimate suffers 
relatively few effective nonmonotonicities. 

4.2. Bump hunting. In this section we summarize numerical information 
about the extent to which level accuracy of Silverman’s bandwidth test, dis¬ 
cussed in Section 3, is influenced by kernel type. Epanechnikov, biweight, 
triweight and Gaussian kernels are treated. It is well known that the Gaus¬ 
sian kernel produces asymptotically conservative tests, in the sense that the 
asymptotic level vr(a), defined at (3.5), tends to be less than a. It is of 
interest to learn what happens for other kernels. 

Figure 5 illustrates results in the case of data simulated from the Beta(3,4) 
distribution. The value of h CI it was found by grid search. The bootstrap 
form, h* rit , of h cr - lt was calculated by averaging over 500 bootstrap resamples 
from each sample, and the value of ir(a) was approximated by averaging 
results over 100 replicates. The resulting curve approximations were slightly 
smoothed to reduce variability. 




Fig. 4. Estimates of probability density of \og(R = h C rit/h nouln ). Panels (a)-(c) corre¬ 
spond to the (a) Epanechnikov, (b) biweight and (c) triweight kernels, respectively. Sample 
sizes were n = 10 (represented by the dotted line), n = 100 (dashed line), and n = 1000 
(solid line). 
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Panels (a) and (b) in Figure 5 correspond to n = 100 and n = 10000, re¬ 
spectively. Each panel displays four approximations to vr(a), indicated on 
the vertical axis, as functions of a. The four curves represent the Epanech- 
nikov (unbroken line), biweight (dotted line), triweight (dot-dash line) and 
Gaussian (dashed line) kernels. The conservative nature of the bandwidth 
test is indicated by the fact that each curve lies below the diagonal, with 
little to distinguish the different kernels. This lends support to the view that 
using non-Gaussian kernels when testing for modality does not substantially 
alter the conclusions of a test. The conservatism could be alleviated by using 
any of several available corrections, for example, that suggested by Hall and 
York (2001). 

5. Technical arguments. 

5.1. Proof of Theorem 2.1. Without loss of generality the mode of K 
equals 0. It suffices to show that for each n there exists e = s(n) > 0 such 
that, whenever X \,..., X n come from a continuous distribution supported on 
[—e, e] [call this assumption (A)], the mixture density g = n^ 1 Yi K( x + Aj) 
is strictly unimodal with probability 1. 

If (A) holds then g'(x) > 0 whenever x < —e, and equality occurs if and 
only if K'{x + A,;) = 0 for each i, which by assumption is true only at points 
of inflection of K(- + Xf) (by assumption there is only a finite number of 
these) or at points outside the support of K(- + Xf). Therefore if (A) holds 
then g' > 0 on (—oo, —e) and g' < 0 on (e, oc), with equality holding in either 
case only outside the support of g or at points of inflection inside the support 
of g , there being at most a finite number of these. Call this property (P). 

(a) n=100 (b) n=10000 




a a 

Fig. 5. Level accuracy of bandwidth test. The four curves in each panel represent nu¬ 
merical approximations to levels of the bandwidth test when the Epanechnikov, biweight, 
triweight or Gaussian kernel is used to implement the test. Line types are as indicated in 
boxes. Panels (a) and (b) are for n = 100 and n = 10000, respectively. 
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Since, for some r/ > 0, K is concave in the neighborhood (m — r],m + 
77 ) of m, then, provided assumption (A) holds for sufficiently small e, g is 
concave in (m — + ^ 77 ). Combining this property with (P) we deduce 

that g is strictly unimodal on its support, except for the possibility that 
the set of points that gives a maximum of g form a nondegenerate interval. 
However, this entails J2iK'(x + Aj) = 0 for & 11 x i n that interval, which, 
since the sampled distribution is continuous and K is strictly unimodal, 
holds with probability 0 . 


5.2. Proof of Theorem 2.2. Let 0 < e < ^. Now, 

g e (x) = \{K {—1 + e + x) + K(l - e + x)} 

= K(1 — e) + \x 2 K"(1 — e) + o(x 2 ) 

as x —> 0. Therefore, if 0 < e < \ then K(—l + e + x) + K(l — e + x) is strictly 
concave in the neighborhood of the origin. It follows that the density g e has 
at least three modes. 

Equivalently, the density 


(5.1) 


2 h 


-K 


x - X, 


+ —K 
2 h 


x-X 2 

h 


equal to the kernel density estimator computed from the sample {X \, X 2 j 
of size n = 2, has at least three modes if \\X\ — X 2 \ < h < \Xi — X 2 \, and 
has precisely two modes if h < ^\X\ — X 2 \. 

More generally, given a sample X of size n > 2 we may order the data as 
X(\) < ■ ■ ■ < X( n ). Let Si = X( i+ i) — X(j), for 1 < i < n — 1, denote the ith 
spacing. If the sampled distribution is continuous then with probability 1 
no two spacings are equal, and so they may be ranked in order of strictly 
increasing size, without ties. Let S mi n denote the smallest spacing. Then 
with probability 1 the density estimator f^ has at least n + 1 modes if 
^5'min < h < 5 m i n , and has precisely n modes if h < f Sm\n - 


5.3. Proof of Theorem 3.1. Assumptions (3.4) imply that uj(t,u) has 
two continuous derivatives with respect to t, and that vT 1 ^’(tu,u) is pro¬ 
portional to u>i(t,u) = u~ 3 / 2 uj 2 (t,u) — ct , where 

u 2 (t,u) = J K"(t + v)W u (v)dv, 


c= \f"(xo)\/f(xo) 1 ^ 2 and W u (v) = —W(uv)/u 1 / 2 is a standard Brownian 
motion. If — \ < t\ < t 2 < \ then 


u 2 {ti,u) 


u 2 (t 2 ,u)\ < 



K"(t 2 + v) 


K"(ti+v)\\W u (v)\dv 
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(5.2) 



r—l—ti 

+ 

/—1—*2 


/■I—tl 

+ 

/ 

/l-t 2 


< 


I<"(t 2 + v)\\W u (v)\dv 

K"(ti + v)\\W u {v)\dv 
4(f 2 - h) ^sup|/L"| + sup\K'"nS(u), 


where S(u) = f_ 2 < v <2 |ITu(^)| dv. (These bounds require K to have three 
derivatives as a function on I, but not as a function on the real line.) For 
each e > 0, S(u) = 0(u e ) with probability 1 as u —> oo. Therefore, by (5.2), 


(5.3) 


sup 

-l<tl<t2<l 


U2{tl,u) ~ UJ 2 {t 2 ,U ) 
tl — t 2 


0{u £ ) 


with probability 1 as u —> oo. 

Solutions t = t of L>'(t,u ) = 0 are equivalently solutions of uji(t,u) = 0, 
and may be shown by Taylor expansion to satisfy sup|t| —> 0 as u —> oo, 
with probability 1, where the supremum is taken over all solutions. (It is 
straightforward to prove that with probability 1, at least one solution exists 
for all sufficiently large u.) Let w > 0 be given, and suppose the probability 
that for some u > w at least two distinct solutions exist is bounded away 
from 0 (along a subsequence of values of w) as w —> oo. Take t\ and t 2 to 
be two such solutions, when u>w and w is an element of the subsequence. 
Then u~ 3 / 2 uj 2 (ij,u) = ctj for each j, whence 

( 5 4 ) u -3/2 W 2 (h,u) -hJ 2 (j 2 ,u) = c 

ii-t 2 

Result (5.3) implies, however, that with probability 1 the left-hand side of 
(5.4) converges to 0 as u —* oo. On the other hand, the right-hand side is 
fixed and nonzero. This contradiction demonstrates the incorrectness of our 
assumption that two distinct solutions t\ and t 2 of uj'{t,u ) = 0 exist, and 
proves the theorem. 


5.4. Proof of Theorem 3.2. Put h° = n 1 / 5 , write H = H{n) for a posi¬ 
tive sequence such that H(n) —> 0 and H(n)/hP —* oo, and redefine h cr i t to 
be the infimum of values 0 < hi < H (n) such that fh is strongly unimodal 
for all h>h\. It is readily proved that the probability that this version of 
h cr it, and the version defined at (2.3), are identical converges to 1 as n —* oo. 
Therefore it is sufficient to prove that for the new version, n 1 /i cr ;t —> U cr it 
in distribution. 

Let Sh denote the support of fh, and write 5^ for the interior of Sh- 
Using strong approximation of the empirical distribution by a Brownian 
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bridge [Komlos, Major and Tusnady (1976)], it may be shown that for each 
Ci, e > 0 there exists C 2 = C 2 (Ci, e) > 0 such that for all sufficiently large n, 

P{for all h G [C\h°, H(n)\, f' h {x) > 0 

(5.5) for x G S® such that x < xq — C 2 h, 

and f' h (x) < 0 for x G S® such that x > xo + C 2 h} > 1 — e. 

The method of proof consists of showing first that for each h, Ehf' h (x) is 
strictly positive on (a — h,x 0 — C 2 h) and strictly negative on (xo + C 2 h, b + h), 
and thence demonstrating that for each C\,e > 0 there exists C 2 = C 2 {C\,e) > 
0 such that for all sufficiently large n, 

P{for all h G [Cih°, H(n)\ and all x G (a — h, b + h ) for 

which \x - xq I > C 2 h, \f' h (x) - Ehf' h (x)\ > \\Ehf' h {x)\} > 1 - e. 

The same strong approximation methods may be used to prove that for 
each C 2 ,e > 0 there exists C 3 = Cs(C 2 ,e) > 0 such that for all sufficiently 
large n, 


(5.6) P\ sup sup | fh(x) - EfX(x) | > e l > 1 - e. 

f h£[Csh a ,H(n)] \x—xo\<C 2 h J 

[In each case the arguments are broadly similar to those of Mamrnen, Marron 
and Fisher (1992). See also Silverman (1983).] 

Note too that for each C\ > 0, if{/^(x)} = f"{xo) + o(l) uniformly in |x — 
xo\ < C ]h and h < H(n). Combining this result with (5.6) we deduce that 
for each C 2 , e > 0 there exists C 3 = C 3 (C 2 , e) > 0 such that for all sufficiently 
large n, 

P{for allh£[C 3 h°,H(n)\, 

(5-7) 

fh is strictly concave on (xq — C 2 h , xq + C 2 h)} > 1 — e. 


Together (5.5) and (5.7) imply that for each e > 0 there exists C 3 = C 3 (e) > 0 
such that for all sufficiently large n, 


(5.8) 


P{for all h G [C 3 h°, H(n)}, 

fh is strictly unimodal on its support} > 1 — e. 


Strong approximation methods may also be used to prove the existence 
of a Brownian motion W such that, defining h u = uh°, 

Aj(t, u) = n (2-p / 5{^j ) ( a .0 + h 0q _ Ef^(x 0 + h°t)} and 
a,j(t,u ) = f(xo) 1/,2 u~^ +1 ' ) J ^ dW(v), 
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where j = 0, 1 or 2, we have for each 0 < C\ < C3 < 00 and each C2 > 0, 

(5.9) pj sup sup \Aj(t, u) — a,j(t, u)\ > n~ s 1 = 0(n~ x ) 

L111 <C? Cl <u<Cs J 

for some 5 > 0 and all C 2 , A > 0. Observe too that if j = 1 . 2 . 

(5.10) n ( 2 --?)/ 5 {£/W(a ;0 + h°t) - f^\x Q )} - tf"(x 0 )I(j = 1) 0 

uniformly in \t\ < C 2 and C\ < u < C3. 

Denote by N = N (C2, u) the number of crossings of 0 made by the process 
ai(-,u) in [—C2, C 2 ], and let the crossings be Ti(u),. .., TV- For each C2 > 0 
and 0 < C\ < C3 < 00 , the value of sup Cl<u<c - 3 N(C2,u) is finite with prob¬ 
ability 1. The continuous, nondegenerate property of the joint distributions 
of a\(-,u) and a,2(-,u) implies that 

lim P[\a2{Ti(u),u}\ > e for 1 < i < N(C2,u) and C\ < u < C 3 ] = 1. 

Note too that uj'(t,u ) = a±(t,u ) -\-tf"(x 0 ). The results immediately above, 
and (5.9) and (5.10), imply that 

P{for each C\ <u<Cs, the number of downcrossings of 0 made 

(5.11) by f' hu (x'o + h°t) for t £ [—C2, C2], equals the number 

of downcrossings of 0 made by u'{t,u) for t £ [—( 72 ,( 72 ]} —> 1 


as n—> 00 . 

Analogously to (5.8), but more simply, it may be shown that for each 
e > 0 there exists C 3 = C3 (e) > 0 such that 


(5.12) 


P{for all u > C 3 , there is a unique t = t £ (— 00 , 00 ) at 

which u'(t,u) vanishes, and t is a downcrossing} > 1 — e. 


Combining (5.5), (5.8), (5.11) and (5.12), we deduce that for all C\ > 0 , 


P{for all h £ [C\h°, H(n)\, the number of downcrossings of 0 made 

by fh equals the number of downcrossings of 0 made by u'(t,u)} > 1 — e. 


The theorem follows from this result and (5.12). 
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